Overview

Dataset statistics

Number of variables19
Number of observations168592
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory30.7 MiB
Average record size in memory191.1 B

Variable types

NUM13
CAT4
BOOL2

Reproduction

Analysis started2020-06-01 04:05:05.879186
Analysis finished2020-06-01 04:05:45.889251
Duration40.01 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

artists has a high cardinality: 33268 distinct values High cardinality
name has a high cardinality: 131361 distinct values High cardinality
release_date has a high cardinality: 10813 distinct values High cardinality
name is uniformly distributed Uniform
id has unique values Unique
instrumentalness has 44077 (26.1%) zeros Zeros
key has 21145 (12.5%) zeros Zeros
popularity has 25628 (15.2%) zeros Zeros

Variables

acousticness
Real number (ℝ≥0)

Distinct count4715
Unique (%)2.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5013601471548472
Minimum0.0
Maximum0.996
Zeros16
Zeros (%)< 0.1%
Memory size1.3 MiB

Quantile statistics

Minimum0
5-th percentile0.00143
Q10.0978
median0.515
Q30.896
95-th percentile0.992
Maximum0.996
Range0.996
Interquartile range (IQR)0.7982

Descriptive statistics

Standard deviation0.3779929264
Coefficient of variation (CV)0.7539349279
Kurtosis-1.61903427
Mean0.5013601472
Median Absolute Deviation (MAD)0.398
Skewness-0.028073633
Sum84525.30993
Variance0.1428786524
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.99531461.9%
 
0.99423541.4%
 
0.99318111.1%
 
0.99215620.9%
 
0.99113100.8%
 
0.9912150.7%
 
0.99611000.7%
 
0.98910450.6%
 
0.9889500.6%
 
0.9878340.5%
 
Other values (4705)15326590.9%
 
ValueCountFrequency (%) 
016< 0.1%
 
1e-061< 0.1%
 
1.01e-063< 0.1%
 
1.03e-061< 0.1%
 
1.05e-062< 0.1%
 
ValueCountFrequency (%) 
0.99611000.7%
 
0.99531461.9%
 
0.99423541.4%
 
0.99318111.1%
 
0.99215620.9%
 

artists
Categorical

HIGH CARDINALITY

Distinct count33268
Unique (%)19.7%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
['Francisco Canaro']
 
956
['Ignacio Corsini']
 
635
['Frank Sinatra']
 
596
['The Rolling Stones']
 
525
['Bob Dylan']
 
525
Other values (33263)
165355
ValueCountFrequency (%) 
['Francisco Canaro']9560.6%
 
['Ignacio Corsini']6350.4%
 
['Frank Sinatra']5960.4%
 
['The Rolling Stones']5250.3%
 
['Bob Dylan']5250.3%
 
['Johnny Cash']4990.3%
 
['Elvis Presley']4920.3%
 
['The Beach Boys']4740.3%
 
['Francisco Canaro', 'Charlo']4590.3%
 
['Queen']4420.3%
 
Other values (33258)16298996.7%
 

Length

Max length661
Median length17
Mean length23.46094121
Min length5

danceability
Real number (ℝ≥0)

Distinct count1240
Unique (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5336484073977413
Minimum0.0
Maximum0.988
Zeros146
Zeros (%)0.1%
Memory size1.3 MiB

Quantile statistics

Minimum0
5-th percentile0.228
Q10.412
median0.543
Q30.662
95-th percentile0.812
Maximum0.988
Range0.988
Interquartile range (IQR)0.25

Descriptive statistics

Standard deviation0.1759189486
Coefficient of variation (CV)0.3296532814
Kurtosis-0.4275884837
Mean0.5336484074
Median Absolute Deviation (MAD)0.124
Skewness-0.1880462156
Sum89968.8523
Variance0.03094747648
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.5654360.3%
 
0.6124070.2%
 
0.5594070.2%
 
0.5784050.2%
 
0.5454050.2%
 
0.5563970.2%
 
0.6023940.2%
 
0.5673930.2%
 
0.5463910.2%
 
0.5633910.2%
 
Other values (1230)16456697.6%
 
ValueCountFrequency (%) 
01460.1%
 
0.05511< 0.1%
 
0.05592< 0.1%
 
0.05621< 0.1%
 
0.05692< 0.1%
 
ValueCountFrequency (%) 
0.9881< 0.1%
 
0.9862< 0.1%
 
0.9851< 0.1%
 
0.9821< 0.1%
 
0.983< 0.1%
 

duration_ms
Real number (ℝ≥0)

Distinct count49513
Unique (%)29.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean232701.55738113314
Minimum5108
Maximum5403500
Zeros0
Zeros (%)0.0%
Memory size1.3 MiB

Quantile statistics

Minimum5108
5-th percentile116893
Q1172160
median209133
Q3263707
95-th percentile413200
Maximum5403500
Range5398392
Interquartile range (IQR)91547

Descriptive statistics

Standard deviation122392.1252
Coefficient of variation (CV)0.5259617794
Kurtosis120.0543702
Mean232701.5574
Median Absolute Deviation (MAD)43826.5
Skewness6.645153969
Sum3.923162096e+10
Variance1.497983231e+10
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
19200057< 0.1%
 
18000054< 0.1%
 
18600051< 0.1%
 
24000050< 0.1%
 
18400049< 0.1%
 
16900046< 0.1%
 
16000046< 0.1%
 
17000046< 0.1%
 
16800046< 0.1%
 
20000043< 0.1%
 
Other values (49503)16810499.7%
 
ValueCountFrequency (%) 
51081< 0.1%
 
59911< 0.1%
 
63621< 0.1%
 
64671< 0.1%
 
88532< 0.1%
 
ValueCountFrequency (%) 
54035001< 0.1%
 
42700341< 0.1%
 
42694071< 0.1%
 
41646851< 0.1%
 
41202582< 0.1%
 

energy
Real number (ℝ≥0)

Distinct count2362
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4885770204446237
Minimum0.0
Maximum1.0
Zeros7
Zeros (%)< 0.1%
Memory size1.3 MiB

Quantile statistics

Minimum0
5-th percentile0.0742
Q10.265
median0.48
Q30.709
95-th percentile0.925
Maximum1
Range1
Interquartile range (IQR)0.444

Descriptive statistics

Standard deviation0.2673462485
Coefficient of variation (CV)0.5471936611
Kurtosis-1.088554174
Mean0.4885770204
Median Absolute Deviation (MAD)0.222
Skewness0.0750147121
Sum82370.17703
Variance0.07147401661
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.2542360.1%
 
0.3412360.1%
 
0.3062350.1%
 
0.4592320.1%
 
0.312300.1%
 
0.322300.1%
 
0.3042290.1%
 
0.4972290.1%
 
0.2962270.1%
 
0.3252260.1%
 
Other values (2352)16628298.6%
 
ValueCountFrequency (%) 
07< 0.1%
 
1.99e-052< 0.1%
 
2.01e-056< 0.1%
 
2.02e-055< 0.1%
 
2.03e-0514< 0.1%
 
ValueCountFrequency (%) 
121< 0.1%
 
0.99929< 0.1%
 
0.99837< 0.1%
 
0.99754< 0.1%
 
0.99668< 0.1%
 

explicit
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
0
156535
1
 
12057
ValueCountFrequency (%) 
015653592.8%
 
1120577.2%
 

id
Categorical

UNIQUE

Distinct count168592
Unique (%)100.0%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
1ULa3GfdMKs0MfRpm6xVlu
 
1
1yl4vQfgRsRNIcFAmnGaqJ
 
1
2Wc2tcQl7cPetPKeXH3GD3
 
1
1FaiVQR1BUHxttxYUMAQiW
 
1
1KNNTdw7SzJ90p6RXq4kGE
 
1
Other values (168587)
168587
ValueCountFrequency (%) 
1ULa3GfdMKs0MfRpm6xVlu1< 0.1%
 
1yl4vQfgRsRNIcFAmnGaqJ1< 0.1%
 
2Wc2tcQl7cPetPKeXH3GD31< 0.1%
 
1FaiVQR1BUHxttxYUMAQiW1< 0.1%
 
1KNNTdw7SzJ90p6RXq4kGE1< 0.1%
 
7ij4pBHTk9SvyallMZOp9E1< 0.1%
 
5n7zm9p6GIDo3GkJiYGbSN1< 0.1%
 
1hpdVGEaZovrlcnY0Yqcc91< 0.1%
 
4nhpxeFxGQtMZTK0ojcHF31< 0.1%
 
3q4aF4bp248tWr7lAyoFHe1< 0.1%
 
Other values (168582)168582> 99.9%
 

Length

Max length22
Median length22
Mean length22
Min length22

instrumentalness
Real number (ℝ≥0)

ZEROS

Distinct count5402
Unique (%)3.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1694756187246726
Minimum0.0
Maximum1.0
Zeros44077
Zeros (%)26.1%
Memory size1.3 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0.000264
Q30.111
95-th percentile0.906
Maximum1
Range1
Interquartile range (IQR)0.111

Descriptive statistics

Standard deviation0.3153829154
Coefficient of variation (CV)1.860933849
Kurtosis0.8701212171
Mean0.1694756187
Median Absolute Deviation (MAD)0.000264
Skewness1.609969699
Sum28572.23351
Variance0.09946638333
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
04407726.1%
 
0.9172010.1%
 
0.9131980.1%
 
0.9161970.1%
 
0.9221910.1%
 
0.9011880.1%
 
0.9041870.1%
 
0.8941860.1%
 
0.9191860.1%
 
0.9141860.1%
 
Other values (5392)12279572.8%
 
ValueCountFrequency (%) 
04407726.1%
 
1e-0627< 0.1%
 
1.01e-0664< 0.1%
 
1.02e-06880.1%
 
1.03e-0669< 0.1%
 
ValueCountFrequency (%) 
110< 0.1%
 
0.99916< 0.1%
 
0.99810< 0.1%
 
0.9973< 0.1%
 
0.9966< 0.1%
 

key
Real number (ℝ≥0)

ZEROS

Distinct count12
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.196794628452121
Minimum0
Maximum11
Zeros21145
Zeros (%)12.5%
Memory size1.3 MiB

Quantile statistics

Minimum0
5-th percentile0
Q12
median5
Q38
95-th percentile11
Maximum11
Range11
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.504587137
Coefficient of variation (CV)0.6743747613
Kurtosis-1.269440097
Mean5.196794628
Median Absolute Deviation (MAD)3
Skewness0.005418529507
Sum876138
Variance12.282131
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
02114512.5%
 
72055912.2%
 
21883611.2%
 
91760210.4%
 
5163029.7%
 
4129577.7%
 
1125877.5%
 
10119897.1%
 
8106796.3%
 
11102086.1%
 
Other values (2)157289.3%
 
ValueCountFrequency (%) 
02114512.5%
 
1125877.5%
 
21883611.2%
 
372564.3%
 
4129577.7%
 
ValueCountFrequency (%) 
11102086.1%
 
10119897.1%
 
91760210.4%
 
8106796.3%
 
72055912.2%
 

liveness
Real number (ℝ≥0)

Distinct count1741
Unique (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.20515064279443862
Minimum0.0
Maximum1.0
Zeros9
Zeros (%)< 0.1%
Memory size1.3 MiB

Quantile statistics

Minimum0
5-th percentile0.0596
Q10.0982
median0.134
Q30.259
95-th percentile0.614
Maximum1
Range1
Interquartile range (IQR)0.1608

Descriptive statistics

Standard deviation0.1758962449
Coefficient of variation (CV)0.8574004081
Kurtosis5.072045161
Mean0.2051506428
Median Absolute Deviation (MAD)0.052
Skewness2.17595718
Sum34586.75717
Variance0.03093948895
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.11118611.1%
 
0.1116531.0%
 
0.10916461.0%
 
0.10816241.0%
 
0.10715350.9%
 
0.10614950.9%
 
0.10514900.9%
 
0.11214840.9%
 
0.10413790.8%
 
0.10313740.8%
 
Other values (1731)15305190.8%
 
ValueCountFrequency (%) 
09< 0.1%
 
0.009671< 0.1%
 
0.01011< 0.1%
 
0.01031< 0.1%
 
0.01161< 0.1%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.9991< 0.1%
 
0.9982< 0.1%
 
0.9976< 0.1%
 
0.9963< 0.1%
 

loudness
Real number (ℝ)

Distinct count25354
Unique (%)15.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-11.358179795008066
Minimum-60.0
Maximum3.855
Zeros0
Zeros (%)0.0%
Memory size1.3 MiB

Quantile statistics

Minimum-60
5-th percentile-21.961
Q1-14.388
median-10.466
Q3-7.135
95-th percentile-4.11
Maximum3.855
Range63.855
Interquartile range (IQR)7.253

Descriptive statistics

Standard deviation5.670175788
Coefficient of variation (CV)-0.4992151815
Kurtosis2.042117824
Mean-11.3581798
Median Absolute Deviation (MAD)3.565
Skewness-1.109694971
Sum-1914898.248
Variance32.15089346
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
-6.94228< 0.1%
 
-10.10726< 0.1%
 
-7.77526< 0.1%
 
-7.43626< 0.1%
 
-11.81526< 0.1%
 
-8.78925< 0.1%
 
-8.3225< 0.1%
 
-9.18425< 0.1%
 
-7.04725< 0.1%
 
-7.74425< 0.1%
 
Other values (25344)16833599.8%
 
ValueCountFrequency (%) 
-607< 0.1%
 
-551< 0.1%
 
-54.8371< 0.1%
 
-54.3761< 0.1%
 
-51.1231< 0.1%
 
ValueCountFrequency (%) 
3.8551< 0.1%
 
3.7441< 0.1%
 
2.7991< 0.1%
 
1.9631< 0.1%
 
1.831< 0.1%
 

mode
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
1
119607
0
48985
ValueCountFrequency (%) 
111960770.9%
 
04898529.1%
 

name
Categorical

HIGH CARDINALITY
UNIFORM

Distinct count131361
Unique (%)77.9%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
Summertime
 
58
Overture
 
47
Home
 
39
You
 
35
Stay
 
33
Other values (131356)
168380
ValueCountFrequency (%) 
Summertime58< 0.1%
 
Overture47< 0.1%
 
Home39< 0.1%
 
You35< 0.1%
 
Stay33< 0.1%
 
I Love You32< 0.1%
 
Runaway32< 0.1%
 
Angel31< 0.1%
 
Forever30< 0.1%
 
Time After Time29< 0.1%
 
Other values (131351)16822699.8%
 

Length

Max length217
Median length18
Mean length23.52237354
Min length1

popularity
Real number (ℝ≥0)

ZEROS

Distinct count100
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean31.626862484578155
Minimum0
Maximum100
Zeros25628
Zeros (%)15.2%
Memory size1.3 MiB

Quantile statistics

Minimum0
5-th percentile0
Q113
median34
Q348
95-th percentile65
Maximum100
Range100
Interquartile range (IQR)35

Descriptive statistics

Standard deviation21.39325951
Coefficient of variation (CV)0.676426867
Kurtosis-0.9968605208
Mean31.62686248
Median Absolute Deviation (MAD)16
Skewness-0.0276383837
Sum5332036
Variance457.6715522
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
02562815.2%
 
4132071.9%
 
4431591.9%
 
4231411.9%
 
4330891.8%
 
4030251.8%
 
4529521.8%
 
3929181.7%
 
3828951.7%
 
3728731.7%
 
Other values (90)11570568.6%
 
ValueCountFrequency (%) 
02562815.2%
 
123851.4%
 
216681.0%
 
313380.8%
 
411350.7%
 
ValueCountFrequency (%) 
1001< 0.1%
 
981< 0.1%
 
973< 0.1%
 
962< 0.1%
 
952< 0.1%
 

release_date
Categorical

HIGH CARDINALITY

Distinct count10813
Unique (%)6.4%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
1940-01-01
 
1279
1949
 
1251
1945
 
1106
1948
 
1053
1930-01-01
 
1047
Other values (10808)
162856
ValueCountFrequency (%) 
1940-01-0112790.8%
 
194912510.7%
 
194511060.7%
 
194810530.6%
 
1930-01-0110470.6%
 
19569970.6%
 
19519870.6%
 
1950-01-019700.6%
 
19579200.5%
 
19469030.5%
 
Other values (10803)15807993.8%
 

Length

Max length10
Median length10
Mean length8.230289693
Min length4

speechiness
Real number (ℝ≥0)

Distinct count1614
Unique (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.08361604702476985
Minimum0.0
Maximum0.968
Zeros146
Zeros (%)0.1%
Memory size1.3 MiB

Quantile statistics

Minimum0
5-th percentile0.0281
Q10.0348
median0.0446
Q30.0723
95-th percentile0.288
Maximum0.968
Range0.968
Interquartile range (IQR)0.0375

Descriptive statistics

Standard deviation0.1199168742
Coefficient of variation (CV)1.434137088
Kurtosis26.36340493
Mean0.08361604702
Median Absolute Deviation (MAD)0.0126
Skewness4.674015274
Sum14096.9966
Variance0.01438005672
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.03475750.3%
 
0.03345690.3%
 
0.03375670.3%
 
0.0335660.3%
 
0.03195650.3%
 
0.03325610.3%
 
0.03525590.3%
 
0.0345580.3%
 
0.03445540.3%
 
0.03485530.3%
 
Other values (1604)16296596.7%
 
ValueCountFrequency (%) 
01460.1%
 
0.02221< 0.1%
 
0.02233< 0.1%
 
0.02245< 0.1%
 
0.02255< 0.1%
 
ValueCountFrequency (%) 
0.9681< 0.1%
 
0.9672< 0.1%
 
0.9669< 0.1%
 
0.96511< 0.1%
 
0.96418< 0.1%
 

tempo
Real number (ℝ≥0)

Distinct count84203
Unique (%)49.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean116.91829485977982
Minimum0.0
Maximum244.091
Zeros146
Zeros (%)0.1%
Memory size1.3 MiB

Quantile statistics

Minimum0
5-th percentile74.08855
Q193.50075
median114.795
Q3135.7335
95-th percentile174.47425
Maximum244.091
Range244.091
Interquartile range (IQR)42.23275

Descriptive statistics

Standard deviation30.7265272
Coefficient of variation (CV)0.2628034153
Kurtosis-0.07475221339
Mean116.9182949
Median Absolute Deviation (MAD)21.126
Skewness0.4457320072
Sum19711489.17
Variance944.1194736
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
01460.1%
 
12020< 0.1%
 
120.00518< 0.1%
 
120.01218< 0.1%
 
119.99417< 0.1%
 
129.99516< 0.1%
 
94.00616< 0.1%
 
119.96916< 0.1%
 
130.00915< 0.1%
 
119.99715< 0.1%
 
Other values (84193)16829599.8%
 
ValueCountFrequency (%) 
01460.1%
 
30.9461< 0.1%
 
31.9881< 0.1%
 
32.4661< 0.1%
 
32.81< 0.1%
 
ValueCountFrequency (%) 
244.0911< 0.1%
 
243.5071< 0.1%
 
243.3721< 0.1%
 
238.8951< 0.1%
 
236.7991< 0.1%
 

valence
Real number (ℝ≥0)

Distinct count1747
Unique (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5284973540565387
Minimum0.0
Maximum1.0
Zeros184
Zeros (%)0.1%
Memory size1.3 MiB

Quantile statistics

Minimum0
5-th percentile0.088055
Q10.315
median0.539
Q30.749
95-th percentile0.938
Maximum1
Range1
Interquartile range (IQR)0.434

Descriptive statistics

Standard deviation0.2644568858
Coefficient of variation (CV)0.5003939638
Kurtosis-1.07128991
Mean0.5284973541
Median Absolute Deviation (MAD)0.216
Skewness-0.1088511723
Sum89100.42592
Variance0.06993744447
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.9617210.4%
 
0.9625890.3%
 
0.9635270.3%
 
0.9644670.3%
 
0.9654060.2%
 
0.963890.2%
 
0.9663560.2%
 
0.9673120.2%
 
0.9682680.2%
 
0.5592530.2%
 
Other values (1737)16430497.5%
 
ValueCountFrequency (%) 
01840.1%
 
1e-05850.1%
 
6.41e-051< 0.1%
 
0.000491< 0.1%
 
0.0005371< 0.1%
 
ValueCountFrequency (%) 
13< 0.1%
 
0.9992< 0.1%
 
0.9981< 0.1%
 
0.9962< 0.1%
 
0.9951< 0.1%
 

year
Real number (ℝ≥0)

Distinct count100
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1977.457773797096
Minimum1921
Maximum2020
Zeros0
Zeros (%)0.0%
Memory size1.3 MiB

Quantile statistics

Minimum1921
5-th percentile1935
Q11957
median1978
Q31999
95-th percentile2016
Maximum2020
Range99
Interquartile range (IQR)42

Descriptive statistics

Standard deviation25.4067568
Coefficient of variation (CV)0.01284819183
Kurtosis-1.020188288
Mean1977.457774
Median Absolute Deviation (MAD)21
Skewness-0.1345158309
Sum333383561
Variance645.5032913
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
197020001.2%
 
197520001.2%
 
196820001.2%
 
196920001.2%
 
197120001.2%
 
197220001.2%
 
197320001.2%
 
197420001.2%
 
197620001.2%
 
198520001.2%
 
Other values (90)14859288.1%
 
ValueCountFrequency (%) 
19211280.1%
 
192272< 0.1%
 
19231690.1%
 
19242360.1%
 
19252630.2%
 
ValueCountFrequency (%) 
202016811.0%
 
201918891.1%
 
201820001.2%
 
201719901.2%
 
201619101.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

acousticnessartistsdanceabilityduration_msenergyexplicitidinstrumentalnesskeylivenessloudnessmodenamepopularityrelease_datespeechinesstempovalenceyear
00.732['Dennis Day']0.8191805330.34107xPhfUan2yNtyFG0cUWkt80.00000070.1600-12.4411Clancy Lowered the Boom819210.415060.9360.96301921
10.982['Sergei Rachmaninoff', 'James Levine', 'Berliner Philharmoniker']0.2798316670.21104BJqT0PrAfrxzMOxytFOIz0.878000100.6650-20.0961Piano Concerto No. 3 in D Minor, Op. 30: III. Finale. Alla breve519210.036680.9540.05941921
20.996['John McCormack']0.5181595070.20305uNZnElqOS3W4fRmRYPk4T0.00000000.1150-10.5891The Wearing of the Green619210.061566.2210.40601921
30.982['Sergei Rachmaninoff', 'James Levine', 'Berliner Philharmoniker']0.2798316670.21101SCWBjhk5WmXPxhDduD3HM0.878000100.6650-20.0961Piano Concerto No. 3 in D Minor, Op. 30: III. Finale. Alla breve419210.036680.9540.05941921
40.957['Phil Regan']0.4181666930.19304d6HGyGT8e121BsdKmw9v60.00000230.2290-10.0961When Irish Eyes Are Smiling419210.0380101.6650.25301921
50.957['Phil Regan']0.2591864670.21200Nk5f07H3JaEunGrYfbqHM0.00022220.2360-13.3001Come Back To Erin219210.035885.7260.21801921
60.992['Sergei Rachmaninoff', 'Zubin Mehta', 'Vladimir Feltsman', 'Israel Philharmonic Orchestra']0.2411678670.10902Cfk7LOOiTLVxe9BNoGGCL0.85300010.0771-19.2861Rhapsody on a Theme of Paganini, Op. 43: Var. XVIII, Andante cantabile319210.037779.9740.03791921
70.967['Frank Parker']0.2752100000.30903ftBPsC5vPBKxYSee08FDH0.00002850.3810-9.3161Danny Boy319210.0354100.1090.16501921
80.941['Dennis Day']0.2411963070.27404aVy85Y2sxMwIKmAcimHp00.00000800.0984-9.7500How Can You Buy Killarny319210.029790.7730.21201921
90.993['Sergei Rachmaninoff']0.3892187730.088002GDntOXexBFUvSgaXLPkd0.52700010.3630-21.0910Morceaux de fantaisie, Op. 3: No. 2, Prélude in C-Sharp Minor. Lento219210.045692.8670.07311921

Last rows

acousticnessartistsdanceabilityduration_msenergyexplicitidinstrumentalnesskeylivenessloudnessmodenamepopularityrelease_datespeechinesstempovalenceyear
1685820.000121['Silverstein', 'Aaron Gillespie']0.4411637200.868001aWpVsNjEiqpFvV3KhJiWu0.000003110.1450-5.5160Infinite632020-01-080.0714162.0000.36602020
1685830.000499['Nothing But Thieves']0.6212370530.666003CauBZqN2EuHTJo4sSpjbS0.07680060.0797-7.0791Is Everybody Going Crazy?692020-03-180.0544123.9740.57902020
1685840.057000['Dalex']0.6021925350.728003rBfD2swVRrL2yoqnTh2Fx0.00111060.0797-2.8691Matemáticas732020-03-190.2370167.8720.51302020
1685850.013600['The Strokes']0.3772933600.922006e2pJqucDMxbp061B40r6O0.20400000.2800-3.4001Bad Decisions662020-02-180.0560153.4040.39402020
1685860.535000['Bruno Major']0.8062354270.362007EncNzyGs8uJiQittB38ef0.04500070.1110-10.3811The Most Beautiful Thing662020-05-010.0345127.5020.42002020
1685870.643000['Kelly Clarkson']0.4812057870.368005Y5b0wokJqZxaefDVRm43v0.00000070.1250-8.3100Born to Die582020-03-130.0300140.3840.50702020
1685880.670000['JoJo']0.6611737600.580017wrBKecywD6TUuHQl3wjmj0.00005500.1170-7.7181Man632020-03-130.124072.0090.56602020
1685890.979000['S.J Morgan']0.5021601250.0355067zZV7JzIx9Wcf65Ztt4CJ0.86700000.1060-26.9401Rivers662020-02-280.0379160.0330.28202020
1685900.672000['Childish Gambino']0.1741793870.046604Ksj9mKfsYC5b8v8Ey3c8I0.19600030.4200-18.45810.00622020-03-220.0495116.7130.03372020
1685910.068600['Alina Baraz', '6LACK']0.6151419410.713001fm1PmtAvxXe98n8xOcKnJ0.00144020.1540-3.5391Morocco (feat. 6LACK)622020-03-090.035578.0330.51402020